AITopics | gc policy

Collaborating Authors

gc policy

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Learning to explore when mistakes are not allowed

Pecqueux-Guézénec, Charly, Doncieux, Stéphane, Perrin-Gilbert, Nicolas

arXiv.org Artificial IntelligenceFeb-19-2025

Goal-Conditioned Reinforcement Learning (GCRL) provides a versatile framework for developing unified controllers capable of handling wide ranges of tasks, exploring environments, and adapting behaviors. However, its reliance on trial-and-error poses challenges for real-world applications, as errors can result in costly and potentially damaging consequences. To address the need for safer learning, we propose a method that enables agents to learn goal-conditioned behaviors that explore without the risk of making harmful mistakes. Exploration without risks can seem paradoxical, but environment dynamics are often uniform in space, therefore a policy trained for safety without exploration purposes can still be exploited globally. Our proposed approach involves two distinct phases. First, during a pretraining phase, we employ safe reinforcement learning and distributional techniques to train a safety policy that actively tries to avoid failures in various situations. In the subsequent safe exploration phase, a goal-conditioned (GC) policy is learned while ensuring safety. To achieve this, we implement an action-selection mechanism leveraging the previously learned distributional safety critics to arbitrate between the safety policy and the GC policy, ensuring safe exploration by switching to the safety policy when needed. We evaluate our method in simulated environments and demonstrate that it not only provides substantial coverage of the goal space but also reduces the occurrence of mistakes to a minimum, in stark contrast to traditional GCRL approaches. Additionally, we conduct an ablation study and analyze failure modes, offering insights for future research directions.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

2502.13801

Country:

Europe > France > Île-de-France > Paris > Paris (0.04)
Europe > France > Hauts-de-France > Nord > Lille (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Offline Policy Learning via Skill-step Abstraction for Long-horizon Goal-Conditioned Tasks

Kim, Donghoon, Yoo, Minjong, Woo, Honguk

arXiv.org Artificial IntelligenceAug-20-2024

Goal-conditioned (GC) policy learning often faces a challenge arising from the sparsity of rewards, when confronting long-horizon goals. To address the challenge, we explore skill-based GC policy learning in offline settings, where skills are acquired from existing data and long-horizon goals are decomposed into sequences of near-term goals that align with these skills. Specifically, we present an `offline GC policy learning via skill-step abstraction' framework (GLvSA) tailored for tackling long-horizon GC tasks affected by goal distribution shifts. In the framework, a GC policy is progressively learned offline in conjunction with the incremental modeling of skill-step abstractions on the data. We also devise a GC policy hierarchy that not only accelerates GC policy learning within the framework but also allows for parameter-efficient fine-tuning of the policy. Through experiments with the maze and Franka kitchen environments, we demonstrate the superiority and efficiency of our GLvSA framework in adapting GC policies to a wide range of long-horizon goals. The framework achieves competitive zero-shot and few-shot adaptation performance, outperforming existing GC policy learning and skill-based methods.

distribution shift, gc policy, goal distribution shift, (15 more...)

arXiv.org Artificial Intelligence

2408.113

Genre: Research Report (0.50)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.47)

Add feedback

Goal Exploration via Adaptive Skill Distribution for Goal-Conditioned Reinforcement Learning

Wu, Lisheng, Chen, Ke

arXiv.org Artificial IntelligenceApr-19-2024

Exploration efficiency poses a significant challenge in goal-conditioned reinforcement learning (GCRL) tasks, particularly those with long horizons and sparse rewards. A primary limitation to exploration efficiency is the agent's inability to leverage environmental structural patterns. In this study, we introduce a novel framework, GEASD, designed to capture these patterns through an adaptive skill distribution during the learning process. This distribution optimizes the local entropy of achieved goals within a contextual horizon, enhancing goal-spreading behaviors and facilitating deep exploration in states containing familiar structural patterns. Our experiments reveal marked improvements in exploration efficiency using the adaptive skill distribution compared to a uniform skill distribution. Additionally, the learned skill distribution demonstrates robust generalization capabilities, achieving substantial exploration progress in unseen tasks containing similar local structures.

exploration, skill distribution, structural information, (12 more...)

arXiv.org Artificial Intelligence

2404.12999

Country:

Europe > United Kingdom (0.04)
Asia > Middle East > Qatar > Ad-Dawhah > Doha (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Education (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Goal-Conditioned Reinforcement Learning with Disentanglement-based Reachability Planning

Qian, Zhifeng, You, Mingyu, Zhou, Hongjun, Xu, Xuanhui, He, Bin

arXiv.org Artificial IntelligenceJul-20-2023

Goal-Conditioned Reinforcement Learning (GCRL) can enable agents to spontaneously set diverse goals to learn a set of skills. Despite the excellent works proposed in various fields, reaching distant goals in temporally extended tasks remains a challenge for GCRL. Current works tackled this problem by leveraging planning algorithms to plan intermediate subgoals to augment GCRL. Their methods need two crucial requirements: (i) a state representation space to search valid subgoals, and (ii) a distance function to measure the reachability of subgoals. However, they struggle to scale to high-dimensional state space due to their non-compact representations. Moreover, they cannot collect high-quality training data through standard GC policies, which results in an inaccurate distance function. Both affect the efficiency and performance of planning and policy learning. In the paper, we propose a goal-conditioned RL algorithm combined with Disentanglement-based Reachability Planning (REPlan) to solve temporally extended tasks. In REPlan, a Disentangled Representation Module (DRM) is proposed to learn compact representations which disentangle robot poses and object positions from high-dimensional observations in a self-supervised manner. A simple REachability discrimination Module (REM) is also designed to determine the temporal distance of subgoals. Moreover, REM computes intrinsic bonuses to encourage the collection of novel states for training. We evaluate our REPlan in three vision-based simulation tasks and one real-world task. The experiments demonstrate that our REPlan significantly outperforms the prior state-of-the-art methods in solving temporally extended tasks.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2307.10846

Country:

Asia > China > Shanghai > Shanghai (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback